:orphan: Sklearn Basics 1: Train, Evaluate and Deploy a Classifier ========================================================= In this lesson, we will learn how to train, evaluate and deploy a classifier with Khiops sklearn. We start by importing the sklearn estimator ``KhiopsClassifier``: .. code:: ipython3 import os import pandas as pd from khiops import core as kh from khiops.sklearn import KhiopsClassifier # If there are any issues you may Khiops status with the following command # kh.get_runner().print_status() Training a Classifier --------------------- We’ll train a classifier for the ``Iris`` dataset. This is a classical dataset containing data of different plants belonging to the genus *Iris*. It contains 150 records, 50 for each of the three *Iris*\ ’s variants: *Setosa*, *Virginica* and *Versicolor*. Each record contains the length and the width of both the petal and the sepal of the plant. The standard task, when using this dataset, is to construct a classifier for the type of the *Iris*, based on the petal and sepal characteristics. To train a classifier with Khiops, we only need a dataframe that we are going to load from a file. Let’s first save the location of this file into a variable ``iris_data_file``, load it and take a look at its content: .. code:: ipython3 iris_data_file = os.path.join(kh.get_samples_dir(), "Iris", "Iris.txt") iris_df = pd.read_csv(iris_data_file, sep="\t") print(f"Iris data: 10 first records") iris_df.head() .. parsed-literal:: Iris data: 10 first records .. parsed-literal:: SepalLength SepalWidth PetalLength PetalWidth Class 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa Before training the classifier, we split the data into the feature matrix (sepal length, width, etc) and the target vector containing the labels (the ``Class`` column). .. code:: ipython3 X_iris_train = iris_df.drop("Class", axis=1) y_iris_train = iris_df["Class"] Let’s check the contents of the feature matrix and the target vector: .. code:: ipython3 print("Features of the Iris dataset:") display(X_iris_train.head()) print("") print("Label of the Iris dataset:") display(y_iris_train.head()) .. parsed-literal:: Features of the Iris dataset: .. parsed-literal:: SepalLength SepalWidth PetalLength PetalWidth 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 .. parsed-literal:: Label of the Iris dataset: .. parsed-literal:: 0 Iris-setosa 1 Iris-setosa 2 Iris-setosa 3 Iris-setosa 4 Iris-setosa Name: Class, dtype: object Let’s now train the classifier with the Khiops function ``KhiopsClassifier``. This method returns a model ready to classify new Iris plants. *Note: By default Khiops builds 10 decision trees. This is not necessary for this tutorial so we set ``n_trees=0``* .. code:: ipython3 khc_iris = KhiopsClassifier(n_trees=0) khc_iris.fit(X_iris_train, y_iris_train) .. raw:: html
KhiopsClassifier(n_trees=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
KhiopsClassifier(n_trees=0)
KhiopsClassifier(n_trees=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
KhiopsClassifier(n_trees=0)